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Abstract 


In  this  just  completed  research  program,  we  developed  several  new  directions  for  our  work  in  controlled 
active  vision.  We  have  developed  a  general  framework  for  geometric  observer-like  structures  based  on  non- 
parametric  implicit  (level  set)  curve  descriptions  of  dynamically  varying  shapes.  Special  emphasis  was 
given  to  the  geometric  nature  of  the  dynamical  system  as  well  as  the  key  issue  of  robustness.  In  particular, 
we  formulated  an  approach  to  the  problem  of  information  transport  and  filtering  from  a  measurement  curve 
to  an  estimated  curve. 

We  also  formulated  a  methodology  to  assess  measurement  reliability  that  allows  for  the  selection  of 
a  local  observer-gain.  We  should  note  that  the  dynamical  nature  of  the  evolving  curves  described  implic¬ 
itly  allows  for  the  observation  of  objects  changing  topology  (i.e.,  objects  breaking  and  merging  during 
propagation)  for  which  shape  priors  can  be  naturally  incorporated.  The  proposed  observer  structure  is  con¬ 
tinuous/discrete,  with  continuous-time  system  dynamics  and  discrete-time  measurements.  Its  state  space 
consists  of  an  estimated  curve  position  augmented  by  additional  states  (e.g.,  velocities)  associated  with 
every  point  on  the  estimated  curve.  Multiple  simulation  models  were  proposed  for  state  prediction.  Mea¬ 
surements  are  performed  through  segmentation  algorithms  and  optical  flow  computations. 

In  this  framework,  we  may  naturally  incorporate  several  different  tools  such  as  particle  filtering  as  well 
as  various  segmentation  procedures.  Accordingly,  we  described  a  novel  information-based  segmentation 
algorithm,  which  because  of  its  local/global  nature  seems  ideal  for  tracking  and  was  combined  with  particle 
filtering  for  robust  tracking  in  our  geometric  observer  approach. 
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1  Introduction 


The  problem  and  need  for  robust  visual  tracking  algorithms  is  widespread  for  both  both  military  and 
civilian  applications.  Of  particular  relevance  to  the  Air  Force  is  tracking  in  tactical  directed-energy  en¬ 
gagements.  Here  one  requires  designating  an  aimpoint  on  complex,  resolved  targets  in  the  presence  of 
clutter,  and  maintaining  the  high-energy  laser  (HEL)  on  the  aimpoint  in  the  presence  of  target  motion  and 
atmospheric-induced  aberrations.  To  date,  tracking  has  been  performed  with  2D  target  images,  but  the 
recent  development  of  angle-angle -range  LADAR  with  high  range  resolution  has  enabled  the  possibility  of 
using  3D  target  images  for  aimpoint  selection  and  tracking.  The  visual  tracking  methodology  developed  in 
this  work  has  proven  to  be  very  useful  for  such  applications. 

The  overall  technical  approach  uses  a  deterministic  observer  framework  for  visual  tracking  based  on 
non-parametric  implicit  (level  set)  curve  descriptions.  The  observer  is  continuous/discrete,  with  continuous¬ 
time  system  dynamics  and  discrete-time  measurements.  Its  state  space  consists  of  an  estimated  curve  po¬ 
sition  augmented  by  additional  states  (e.g.,  velocities)  associated  with  every  point  on  the  estimated  curve. 
Several  simulation  models  may  be  used  for  state  prediction,  and  measurements  may  be  performed  utilizing 
various  segmentation  techniques  and  optical  flow  computations.  Special  emphasis  is  given  to  the  geometric 
formulation  of  the  overall  dynamical  system.  The  discrete-time  measurements  lead  to  the  problem  of  geo¬ 
metric  curve  interpolation  and  the  discrete-time  filtering  of  quantities  propagated  along  with  the  estimated 
curve.  Interpolation  and  filtering  are  intimately  linked  to  the  correspondence  problem  between  curves. 
Correspondences  are  established  by  a  Laplace  equation  approach.  The  proposed  scheme  is  implemented 
completely  implicitly  (by  Eulerian  numerical  solutions  of  transport  equations)  and  thus  naturally  allows  for 
topological  changes  and  subpixel  accuracy  on  the  computational  grid.  It  may  be  combined  with  geometric 
particle  filtering  as  well  as  knowledge-based  segmentation. 

The  geometric  observer  structure  developed  in  this  AFOSR  sponsored  research  program  is  flexible 
enough  to  entertain  the  case  where  filtering  position  information  is  not  utilized  and  may  be  replaced  by 
static  position  measurements  in  case  of  clearly  segmentable  image  data,  leading  to  reduced  order  observers 
whose  associated  state  information  still  needs  to  be  filtered.  Specifically,  we  have  formulated  the  following 
novel  techniques  in  the  past  three  years: 

(i)  Geometric  Particle  Filtering  for  Visual  Tracking'.  Since  filtering  plays  such  an  important  role  in  our 
observer  theory,  and  since  we  intend  to  use  the  geometric  observers  in  conjunction  with  active  con¬ 
tours,  we  have  proposed  a  scheme  that  combines  the  advantages  of  particle  filtering  and  geometric 
active  contours  realized  via  level  set  models  for  tracking  deformable  objects.  We  have  investigated 
certain  modifications  to  the  standard  particle  filter  (PF)  [15,  3]  as  follows.  First,  we  used  an  im¬ 
portance  sampling  (IS)  density  [3]  which  can  be  understood  as  an  approximation  to  the  optimal  IS 
density  when  the  optimal  density  is  multi-modal.  Next,  we  replaced  IS  by  deterministic  assignment 
when  the  variance  of  the  IS  density  is  very  small  (which  occurs  when  the  local  deformation  is  small). 
Consequently,  we  are  actually  only  sampling  on  the  6-dimensional  space  of  affine  deformations, 
while  approximating  local  deformation  by  the  mode  of  its  posterior.  This  is  what  makes  the  our 
PF  algorithm  implementable  in  real  time.  (The  full  space  of  contour  deformations  is  theoretically 
infinite.) 

(ii)  Information-Theoretic  Approaches  to  Segmentation'.  Segmentation  is  essential  to  our  visual  track¬ 
ing  framework.  We  therefore  have  developed  a  geometric  contour  based  segmentation  procedure 
that  naturally  fits  into  our  geometric  framework.  We  found  that  an  active  contour  flow  derived  from 
an  information-theoretic  based  criterion  constituted  a  very  reasonable  approach  in  this  regard.  More 
specifically,  we  have  proposed  an  active  contour  model  whose  evolution  is  driven  by  the  gradient 
flow  derived  from  an  energy  functional  that  is  based  on  the  Bhattacharyya  distance.  The  approach 
can  be  viewed  as  a  generalization  of  those  segmentation  methods,  in  which  the  active  contours  max¬ 
imize  the  difference  between  a  finite  number  of  empirical  moments  of  the  distributions  “inside”  and 
“outside”  the  evolving  contour.  The  model  is  very  versatile  and  flexible  since  it  allows  one  to  easily 
accommodate  a  number  of  diverse  image  features.  Further  it  can  incorporate  both  local  and  global 
information. 
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2  Summary  of  Work 

We  summarize  some  of  the  key  results  developed  as  part  of  our  AFOSR  research  program. 

2.1  Geometric  Observers 

In  [35],  we  formulated  a  general  framework  for  geometric  observer-like  structures  based  on  non-parametric 
implicit  (level  set)  curve  descriptions  [38,  39].  Special  emphasis  was  given  to  the  geometric  nature  of  the 
dynamical  system  as  well  as  the  key  issue  of  robustness.  In  particular,  we  formulated  an  approach  to 
the  problem  of  information  transport  and  filtering  from  a  measurement  curve  to  an  estimated  curve.  We 
proposed  a  way  to  assess  measurement  reliability  that  allows  for  the  selection  of  a  local  observer-gain. 
We  should  also  note  that  the  dynamical  nature  of  the  evolving  curves  described  implicitly  also  allows  for 
the  observation  of  objects  changing  topology  (i.e.,  objects  breaking  and  merging  during  propagation)  for 
which  shape  priors  can  be  naturally  incorporated. 

This  framework  also  fits  in  very  naturally  with  geometric  statistically  based  approaches  for  detection 
and  identification;  see,  e.g.  [61,  45,  46,  47]  and  the  references  therein.  Indeed,  one  can  apply  the  theory  of 
Hidden  Markov  Models  (HMM)  for  modelling  the  system  dynamics  and  a  particle  filter  to  track  the  state 
as  sketched  in  Section  2.2.  Shape  and  motion  parameters  may  be  included  in  the  hidden  state  vector  (see 
our  discussion  below). 

2.1.1  Observers  for  Tracking 

The  filtering  of  sensed  data  is  a  practical  necessity  when  using  the  data  to  inform  a  feedback  process.  Real 
world  signals  derived  from  sensors  have  noise  and  disturbances  that  must  be  treated  prior  to  incorporating 
the  data  into  the  feedback  loop.  Visual  sensors  are  fundamentally  different  from  traditional  sensors  (e.g., 
gyros,  accelerometers,  range  sensors,  GPS)  in  the  sense  that  the  true  output  for  use  in  the  feedback  loop  is 
usually  not  directly  obtained  from  the  sensor  proper,  but  is  extracted  using  some  computer  vision  algorithm. 

Filtering  methodologies  may  be  divided  in  three  broad  (not  necessarily  mutually  exclusive)  subcate¬ 
gories: 

(1)  Pre-filtering:  Direct  filtering  of  the  image  information  obtained  from  the  vision  sensors,  followed 
by  the  application  of  a  computer  vision  algorithm. 

(2)  Internal-state-filtering:  Filtering  the  internal  states  associated  with  a  computer  vision  algorithm, 
based  on  the  image  information  obtained  from  the  vision  sensors. 

(3)  Post-filtering:  Direct  application  of  a  computer  vision  algorithm  to  the  data  obtained  from  the  vision 
sensors,  followed  by  the  filtering  of  the  output  of  the  computer  vision  algorithm. 

To  illustrate  the  difference  between  these  approaches,  assume  the  objective  is  to  find  the  centroid  of 
a  moving  object  given  noisy  image  information  from  a  vision  sensor.  For  (1),  the  images  are  spatio- 
temporally  filtered,  the  object  is  extracted  from  a  filtered  image,  and  the  centroid  is  computed  given  the 
extracted  object  (the  segmentation).  For  (3),  the  object  is  segmented  from  a  static  image,  the  centroid  is 
computed,  and  the  centroid  position  is  filtered  given  the  centroids  from  previous  image  frames.  For  (2), 
the  object  itself  is  modeled  dynamically  (system  states  being  position,  shape,  velocity,  etc.),  the  object 
states  are  filtered  based  on  the  spatio-temporal  image  information  of  the  vision  sensor,  and  the  centroid 
is  extracted  from  the  internal  state  of  the  modeled  object.  All  methods  may  use  a  model  for  the  expected 
motion,  resulting  in  model-based  filtering  methodologies.  Post-filtering  may  be  regarded  as  a  subclass 
of  internal-state-filtering,  where  the  state-space  is  identical  to  the  output  space  of  the  computer  vision 
algorithm  (e.g.,  the  centroid  position).  In  turn,  this  means  that  internal-state-filtering  describes  a  richer 
class  of  systems. 

Observers  are  internal-state-filters.  They  are  a  classical  concept  in  control  and  estimation  theory,  where 
system  states  need  to  be  reconstmcted  from  measurement  data.  Examples  include  the  classical  determin¬ 
istic  Luenberger  observer,  the  Kalman  filter  and  its  derivatives  (the  unscented  Kalman  filter,  the  extended 
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Kalman  filter),  as  well  as  particle  filters;  see  [48]  and  the  references  therein.  They  all  share  the  common 
observer  ingredients: 

(Ol)  a  dynamical  state  model  and  a  measurement  model  of  the  system  to  be  observed  for  state  prediction, 

(02)  a  measurement  methodology  (e.g.,  a  device  to  measure  velocity,  a  thermometer,  an  object  segmen¬ 
tation,  etc.), 

(03)  and  an  error  correction  scheme  to  reconcile  measurement  and  prediction  to  form  the  state  estima¬ 
tion. 

Irrespective  of  the  observer  similarities  (01)-(03),  observer  approaches  differ  in  terms  of  the  system 
and  measurement  class  for  which  they  are  designed,  and  the  estimation  method  being  employed.  System 
dynamics  and  system  measurements  may  both  be  either  discrete  or  continuous  though  the  most  common 
observers  used  in  practice  are  either  completely  discrete  or  have  continuous-time  system  dynamics  and 
discrete-time  measurements.  In  practice,  measurements  will  be  noisy  and  the  system  model  and  the  mea¬ 
surement  model  will  only  be  correct  up  to  modelling  errors  (“modelling  noise”).  Observers  need  to  be 
robust  to  such  uncertainties.  If  noise  processes  (from  modelling  or  measurement)  are  not  neglected,  system 
descriptions  are  based  on  stochastic  differential  or  stochastic  difference  equations.  In  the  most  general  set¬ 
ting  an  observer  then  becomes  an  estimation  procedure  for  the  conditional  density  function  relating  system 
states  to  the  time  history  of  measurements. 

For  problems  in  visual  tracking,  in  order  to  make  the  observation  problem  tractable,  we  have  proposed 
a  novel  observer  design  on  the  space  of  planar  curves  or  surfaces  in  space  (regarded  as  the  boundaries 
of  shapes),  where  the  system  model  is  continuous-time  and  the  measurements  are  discrete-time.  Special 
emphasis  was  given  to  a  geometric  formulation  of  the  observer.  The  proposed  approach  may  be  viewed  as 
a  geometric  filtering  method  for  the  class  of  computer  vision  algorithms  using  curve  evolutions  where  the 
common  observer  building  blocks  (01-03)  are  reinterpreted  in  the  context  of  dynamic  curve  evolution. 

In  our  framework,  by  combining  multiple  measurements  (e.g.,  shape  based  and  non-shape  based),  we 
can  assess  the  quality  of  measurements  locally,  and  then  locally  adapt  (e.g,  a  velocity  error  injection  gain). 
We  make  extensive  use  of  previous  work  on  establishing  correspondences  between  curves  to  transport 
measured  quantities  from  the  measurement  to  the  evolving  estimated  curve. 

2.1.2  Curves  with  Vector  Fibers  and  State  Evolution 

We  very  briefly  review  some  of  the  material  on  curve  evolution  theory  here.  We  will  be  using  closed  planar 
curves  to  represent  the  boundaries  of  objects  in  this  framework.  The  space  of  smooth  closed  planar  curves, 
denoted  by  C°°(S1\M.2),  forms  an  infinite -dimensional  manifold.  When  dealing  with  the  evolution  of 
curves,  an  additional  temporal  parameter  is  added  to  the  curve  description.  In  short,  planar  curve  evolution 
may  be  described  as  the  time-dependent  mapping:  C(p,t)  :  S 1  x  [0,  r)  i-»-  R2,  where  p  £  [0,1]  is  the 
curve  parameter,  C(p ,  t)  =  [a :(p,  t),  y(p ,  t)]T,  and  C(0,  t)  =  C{  1,  t).  Define  the  interior  and  the  exterior  of 
a  curve  C  on  the  domain  f l  C  R2  as 

int(C )  :=  {x  £  fl  :  (x  —  xc)TAf  >  0,  \/ xc  £  C}  ,  and  ext(C )  :=  fl\  int(C), 

where  A f  is  the  unit  inward  normal  to  C. 

Properly  embedding  a  manifold  in  a  larger  dimensional  space  avoids  the  need  for  parametric  repre¬ 
sentations.  Within  the  context  of  closed  curves,  C  can  be  represented  implicitly  by  a  level  set  function 
T  :  R2  x  [0,  r)  — >  R  [39],  where 

'3/  (0,  i)— 1  =  trace(C(-,t)). 

Frequently,  T  is  chosen  to  be  a  signed  distance  function.  Given  a  curve  evolution  equation 

Ct  =  v, 

where  v  is  a  velocity  vector  and  subscripts  denote  partial  derivatives,  the  corresponding  level  set  evolution 
equation  is 


T’t  +  uTV'k  =  0. 


In  order  to  define  a  dynamical  model  for  curve  evolution,  the  curve  state  space  needs  to  be  expanded  to 
include  curve  velocities.  The  corresponding  space  is  now  a  vector  bundle.  In  brief,  a  vector  bundle  is  a  fam¬ 
ily  of  vector  spaces  parameterized  by  another  space,  in  this  case  the  space  of  closed  planar  curves.  Locally, 
the  curve  vector  bundle  is  diffeomorphic  to  the  cross  product  of  the  space  of  closed  curves,  C°°(S1;  ]R2), 
and  a  given  model  vector  space,  W.  Typically,  an  element  w  GW  will  be  a  vector-valued  function  defined 
on  trace(C(-,t)).  However,  given  an  implicit  representation  for  the  curve  and  its  vector  bundle,  a  time- 
varying  vector  fiber  element  corresponding  to  the  curve  represented  by  'T  is  given  by  w  :  R2  x  [0,  r)  — >  R2, 
an  appropriately  extended  version  of  the  curve’s  fiber  element. 

When  defining  the  evolution  or  deformation  of  a  curve,  the  transport  of  the  fiber  quantities  with  the 
curve  must  also  be  defined.  The  transport  of  the  fiber  component,  w ,  in  the  implicit  representation  is 
induced  by  the  curve  evolution  through  the  advection  equation 

fut(-,0)  =  wa, 

+  Dw  ■  v  =  0, 

where  Dw  denotes  the  Jacobian  of  w,  and  Wq  are  the  fiber  quantities  being  propagated. 


2.1.3  General  Observer  Structure 


In  the  classical  observer  framework  (e.g.,  as  proposed  by  Luenberger  [31])  there  are  prediction  and  mea¬ 
surement  components.  Prediction  incorporates  the  dynamical  assumptions  made  regarding  the  plant  or, 
in  the  context  of  visual  tracking,  the  movement  of  the  object.  In  analogy  with  classical  observer  theory, 
our  proposed  observer  structure  will  contain  a  prediction  and  a  measurement  part.  To  evolve  the  overall 
estimated  curve,  the  prediction  influence  has  to  be  combined  with  the  measurement  influence,  leading  to 
the  correction  step. 

The  observer  to  be  defined  is  a  continuous/discrete  observer,  i.e.,  the  system  evolves  in  continuous  time 
with  available  measurements  at  discrete  time  instants  k  G  N^, 


(  v(C,w,t)\ 
\f  {C,  w,  t)  J 


+  w(t), 


and 


+  sfc, 


where  w  and  Sk  are  the  system  and  measurement  noises,  respectively,  C  represents  the  curve  position,  w 
denotes  additional  states  transported  along  with  C  (e.g.,  velocities),  and  (•)&  denotes  quantities  given  at 
discrete  time  points  £&. 

The  addition  of  a  prediction  model  for  the  active  contour,  a  measurement  model  for  the  active  contour, 
and  a  correction  step  to  the  evolution  and  measurement  described  above  form  the  general  observer  structure. 
The  prediction  and  measurement  models 


(1) 


simulate  the  true  system  dynamics  and  the  measurement  process,  where  the  hat  denotes  simulated  quanti¬ 
ties. 

For  simplicity,  consider  the  case  when  the  complete  state  is  measurable,  hk  =  id  (the  identity  map). 
The  proposed  continuous/discrete  observer  is 


(  v(C,w,t)\ 
\f(C,w,t)J  7 


Zk 


$ 


(2) 


where  (— )  denotes  the  time  just  before  a  discrete  measurement,  (+)  the  time  just  after  the  measurement,  $ 
is  a  correction  function  depending  on  the  gain  parameters  (a  scalar)  and  K'"’  (a  matrix),  and  (Xerr)  is 
the  error  vector  field.  Figure  1  shows  the  observer  structure  as  given  in  equation  (2).  In  what  follows,  these 
observer  components  as  applied  to  closed  curves  are  described  in  further  detail. 
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system  measurement 
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system  measurement 

noise  noise 


(a)  Classical  Euclidean  Observer  Structure. 


(b)  Symbolic  Observer  Structure  for  Curves. 


Figure  1:  General  Observer  Structure.  In  the  Euclidean  case,  simple  subtractions  and  additions  are  used  for 
estimation  error  computations  and  state  corrections.  For  curves,  estimation  errors  are  represented  through 
an  error  vector  field,  relating  the  prediction  to  the  measurement.  This  also  requires  the  computation  of 
a  homotopy  between  the  predicted  and  the  measured  curves.  The  error  vector  field  Xerr  is  computed 
through  a  correspondence  procedure.  To  do  error  correction  in  the  curve  case  requires  the  knowledge  of 
the  predicted  measurement,  the  actual  measurement,  as  well  as  the  error  vector  field  (denoted  by  the  thick 
black  input  line  to  the  system  model).  The  measurement  makes  use  of  standard  segmentation  algorithms 
and  the  system  model  is  given  by  a  chosen  prior.  (For  representational  simplicity  the  observer  structure 
figures  show  continuous  time  measurements.  In  the  proposed  discrete-time  measurement  case,  the  system 
state  only  gets  updated  at  discrete  time  instants.) 


2.1.4  Motion  Priors 

The  prediction  model  part  is  a  motion  prior,  describing  the  time-evolution  of  the  closed  curve,  and  pos¬ 
sibly  its  vector  fiber.  It  is  problem  dependent  and  should  model  as  precisely  as  possible  the  dynamics  of 
the  object(s)  to  be  tracked.  Ideally  the  measurement  part  of  an  observer  should  only  need  to  correct  for 
inaccuracies  due  to  noise.  In  practice  it  will  be  difficult  (or  even  impossible)  to  provide  an  exact  motion 
model  so  that  the  measurement  part  also  needs  to  compensate  for  inaccuracies  of  the  motion  prior. 

With  this  in  mind,  several  priors  are  described  in  this  section,  increasing  in  complexity:  the  static  prior, 
the  constant  velocity  prior,  the  quasi-dynamic  prior,  and  the  dynamic  elastic  prior.  These  should  be  pointers 
to  relatively  general-purpose  priors.  They  should  be  substituted  by  more  accurate  problem-specific  priors, 
if  available. 

The  simplest  possible  prior  is  the  static  prior,  i.e.,  no  motion  at  all,  Ct  =  0.  For  visual  tracking  the 
movement  of  a  curve  is  then  only  driven  by  the  measurement  part  of  the  observer.  Were  it  not  for  the 
measurements,  the  curve  would  stay  fixed  at  one  position.  The  next  simplest  prior  is  the  constant  velocity 
prior,  Ctt  =  0. 

Next,  suppose  that  although  the  dynamics  could  not  be  predicted  in  closed  form,  the  instantaneous 
velocity  could  be  approximated  or  somehow  estimated.  In  this  case,  the  instantaneous  velocity  information 
could  be  used  to  propagate  forward  the  curve, 

Ct  =  (XestoC  -  AO  AT, 

where  Xest  is  the  estimated  instantaneous  velocity  field.  This  is  called  the  quasi-dynamic  prior.  One 
example  of  a  quasi-dynamic  prior  would  be  to  use  the  optical  flow  vector  field  as  a  motion  prior.  The  optical 
flow  field  used  could  be  computed  using  prior  information  or  using  current  information.  Technically,  the 
latter  suggestion  is  not  a  proper  prior  in  our  framework,  since  it  implicitly  depends  on  the  current  image 
through  the  optical  flow  calculation.  Alternatively,  any  justifiable  method  resulting  in  a  flow  field  for  the 
closed  curve  could  be  used.  The  quasi-dynamic  motion  prior  would  be  useful  in  cases  where  there  is  rich 
target  motion  and  an  available  instantaneous  model  of  flow  for  the  visual  information,  but  where  a  faithful 
model  is  not  possible. 
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The  dynamic  elastic  prior  is  based  on  the  dynamic  active  curve  described  in  [37]  where  the  action 
integral  C  =  j^t()  Jq  ^|/u||Ct||2  —  oj  ||Cp||  dp  dt  is  minimized.  In  contrast  to  the  dynamic  curve  evolution 

of  [37],  a  does  not  contain  image  information,  but  is  a  design  parameter  for  curve  regularization  (it  can 
either  be  a  constant  or  a  function  over  space  and  time),  since  the  prior  should  not  depend  on  underlying 
image  information.  The  dynamic  elastic  prior  for  normal  curve  propagation  is  then  given  by: 

pCtt  =  (^-p\\Ct\\2  +  a)  kN-  (Va-Af)Af-  l-p{\\Ct\\2) ST .  (3) 

If  we  give  up  our  strict  image  independence  of  the  prior  and  allow  image  influence  as  for  the  quasi-dynamic 
optical  flow  prior,  a  could  become  image  dependent  again  and  may  be  set  to  an  image  stopping  function; 
this  transforms  equation  (3)  into  the  equation  for  the  dynamic  active  curve  [37], 

Additional  dynamic  priors  have  been  tested.  These  include  dynamic  priors  that  are  area-preserving, 
length-preserving,  smoothness-limiting,  etc.  Shape  restrictions  for  dynamic  priors  could  for  example  be 
accomplished  by  projecting  the  dynamic  evolution  onto  a  certain,  specified  shape  equivalence  class.  Fi¬ 
nally,  the  dynamical  model  may  include  additional  states.  These  could  for  example  be  local  estimates  of 
state  uncertainty,  marker  particles,  etc,  and  would  involve  an  expansion  of  the  model  vector  space  W. 

2.1.5  Measurements 

Measurements  are  used  to  drive  the  estimated  model’s  states  to  the  true  system  states.  This  is  necessary, 
since  generally  the  system  model  will  be  imperfect  and  the  designed  observer  needs  to  be  robust  with 
respect  to  disturbances.  For  visual  tracking,  it  is  difficult  to  come  up  with  accurate  motion  models,  so 
simple  approximations  need  to  suffice. 

The  predicted  measurements  are  based  on  the  current  state  of  the  observer.  Any  of  the  standard  seg¬ 
mentation  algorithms  can  be  used  to  come  up  with  the  “real”  measurement  that  the  predicted  one  has  to  be 
compared  against  to  define  the  error  measure. 

This  observer  set-up  has  two  crucial  advantages: 

•  Any  standard  (static  or  dynamic)  segmentation  algorithm  can  be  employed  for  the  measurement. 
While  the  dynamic  model  is  a  model  of  a  dynamically  evolving  curve,  the  measurement  can  utilize, 
for  example,  area-based  or  region-based  segmentation  algorithms. 

•  Static  and  dynamic  approaches  incorporating  shape  information  exist  [65],  If  these  approaches  are 
used  for  the  measurement  curve,  shape  information  can  be  introduced  into  the  infinite  dimensional 
model  without  the  need  for  the  explicit  incorporation  of  the  shape  information  into  the  dynamical 
model. 

The  inclusion  of  shape  information  as  a  constraint  on  the  measurements  contrasts  with  previous  ap¬ 
proaches  aimed  at  including  shape  information  into  the  dynamics  of  an  evolving  curve  itself  [7]  where 
motion  is  restricted  to  affine  motion.  Using  a  finite  dimensional  motion  group  reduces  the  dimensionality 
of  the  evolution  state  space.  Whereas  a  general  curve  evolution  is  infinite -dimensional,  affine  motion  con¬ 
straints  lead  to  finite -dimensional  descriptions  and  are  thus  relatively  easy  to  implement  and  usually  fast. 
To  account  for  deviations  from  the  motion  model,  correction  terms  need  to  be  introduced,  as  is  done  in  the 
deformotion  work  described  in  [65]. 

Position  measurements  can  be  accomplished  by  any  static  segmentation  method,  which  may  include 
shape  information.  Potential  candidates  are  the  classical  geodesic  active  contour  model  [8,  26],  or  the  Chan- 
Vese  functional  [9]  (see  our  discussion  in  Section  2.2.5),  or  the  information-based  approach  described  in 
Section  2.3.  Measurements  of  the  vector  fiber  quantities  are  performed  based  on  the  location  of  the  position 
measurements,  if  possible.  In  the  case  of  dynamically  evolving  curves,  velocities  need  to  be  measured  on 
the  measured  curve  via  optical  flow. 
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2.1.6  Error  Correction 


The  observer  framework  proposed  here  requires  a  methodology  to  compare  the  predicted  curve  configura¬ 
tion  to  the  measured  curve  configuration.  Minimally,  the  comparison  requires  establishing  unique  cor¬ 
respondences  between  points  on  the  measured  and  the  predicted  curve,  e.g.,  defining  a  diffeomorphic 
homotopy  between  the  two  curves.  The  homotopy  will  be  obtained  through  the  flow  of  an  error  vector 
field  defined  between  the  two  curves.  In  what  follows,  we  describe  the  propagation  of  state  information 
along  the  error  vector  field  and  the  implicit  computation  of  signed  distance  functions  from  which  the  curve 
correction  homotopy  is  defined. 

Error  Vector  Field 

The  error  vector  field  to  be  defined  is  the  manifold  analogue  to  the  measurement  residual  of  an  observer. 
Due  to  the  geometry  of  the  space  of  closed  curves,  there  is  no  unique  way  to  define  the  error  vector  field.  In 
fact,  one  could  consider  the  definition  of  the  error  vector  field  to  be  a  design  choice  in  setting  up  an  observer 
for  closed  curves.  Here,  the  method  chosen  is  a  Laplace  equation  based  approach,  whose  error  vector  field 
induced  flow  is  a  diffeomorphism,  which  is  easy  to  implement  and  fast  to  compute.  Other  approaches  to 
define  an  error  vector  field  exist,  and  in  particular,  diffeomorphic  nonlinear  registration  methodologies  [2] 
have  been  used  as  well  and  constitute  a  continuing  research  problem. 

The  variational  formulation  leading  to  the  Laplace  problem  is 

min  J  ||Vtt||2  dCl,  s.t.  trace(Co)  =  w-1(0),  trace(Ci)  =  u-1(l),  (4) 

where  Co  is  the  source  curve  (the  measurement  curve)  and  C  \  is  the  target  curve  (the  predicted  curve).  Its  so¬ 
lution  requires  careful  construction  of  the  interior  and  boundary  conditions.  The  source  curve  and  the  target 
curve  define  the  following  solution  domain  decomposition  of  the  total  space  Cl,  R  :=  int{Co)  0  int{C\), 
RPi  :=  int(Co)  IT  int(Ci),  and  Ria  :=  Cl  \  (f?  U  RPi),  where  int(C)  denotes  the  interior  of  the  curve  C  and 
0  is  the  set-symmetric  difference. 

To  simplify  the  solution  of  (4)  computationally,  we  change  the  boundary  conditions  to  0  for  the  interior 
curve  parts  ( dRPi  \  (Co  IT  Ci))  and  to  1  for  the  exterior  curve  parts  ( d  ( R  U  RPi )).  Note  that  both  the 
exterior  and  the  interior  curve  parts  may  be  composed  of  subsets  of  Co  and  C\  if  Co  and  C\  intersect. 
By  changing  the  boundary  conditions,  we  can  compose  a  continuous  solution  function  globally  over  all 
of  the  computational  domain.  The  computed  gradient  field  of  this  continuous  solution  may  be  reversed 
with  respect  to  the  original  formulation  (4),  which  can  easily  be  accounted  for  after  its  computation.  Thus 
the  modified  solution  yields  the  same  gradient  field  as  the  original  formulation,  which  is  the  quantity  of 
interest  for  the  purpose  of  the  observer  design.  Via  the  calculus  of  variations,  a  solution  to  (4)  in  the  domain 
enclosed  by  the  source  and  target  curves  with  modified  boundary  conditions  satisfies 

A us(x)  =  0,  x  £  R 


with  the  boundary  conditions 


us{x)  =  0,  a;  £  dRpi  \  (C0  IT  Cf) , 

us(x)  =  1,  x  £  d(R\J  Rpi) ,  (5) 

which  is  a  simple  reformulation  of  the  minimization  problem  (4).  To  facilitate  easy  numerical  computations 
of  the  error  vector  field,  we  extend  the  solution  to  the  remainder  of  the  image  domain,  by  solving  an 
additional  Laplace  equation  on  Ri0  and  a  Poisson  equation  on  Rp, : 

AuPi(x)  =  c  x  £  Rpi,  c  >  0 
A 0,  x  £  Riot 
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with  boundary  conditions 


Upi(x)  =  0,  x  £  dR 


pit 


The  combined  solution 


defines  the  error  vector  field  Xe 


uio{x)  —  1)  x  £  0  ( R  U  Rpi )  , 
uio(x)  =  2,  x  £  <9fl. 


(x)  —  ^  ,Up^(cc),  x  G 

s(x),  x  £  R 


pll 


Xerr{x}  • — 


'  Vtt/||Vu||,  X  £  R, 

Vtto/|| VtX0||,  X  £  Rio, 

Vui/WXuiW,  X  £  Rpi 


on  O  via  the  normalized  gradient. 

To  illustrate  the  behavior  of  Xerr  assume  the  measured  curve  as  Co  and  the  predicted  curve  as  Ci,  where 
Co  is  strictly  interior  of  Ci.  Then  Xerr  becomes  a  vector  field  flowing  the  measured  curve  into  the  predicted 
curve  at  unit  speed.  Flowing  at  unit  speed  implies  that  particles  starting  at  Co  and  flowing  according  to 
Xerr  will  reach  C\  at  different  times  (proportional  to  the  distance  covered).  The  advantage  of  the  Laplace 
based  correspondence  scheme  is  that  it  is  fast,  parametrization-free  and  allows  for  topological  changes.  The 
main  disadvantage  is  that  it  may  lead  to  unwanted  correspondences  since  it  is  not  invariant  to  translations, 
rotations  or  scale.  Thus  other  correspondence  methodologies  will  be  considered  in  our  upcoming  research 
program  [2], 

Information  Transport 

The  error  vector  field  Xerr  defined  above  will  be  used  to  geometrically  interpolate  between  two  curves, 
thereby  defining  the  curve  correction  homotopy  <fr(Xerr;  C,  C;  Kc)  and  also  inducing  a  state  correction 
homotopy  &(Xerr,  ( C ,  q),  (C,  q);  K).  Geometric  interpolation  is  achieved  by  measuring  the  distance  be¬ 
tween  correspondence  points  along  the  characteristics,  defined  by  the  error  vector  field,  that  connect  them 
and  subsequent  flow  up  to  a  certain  percentage  of  this  distance.  Performing  this  procedure  entails  solving 
a  series  of  associated  transport  equations. 

Given  that  the  curve  evolution  is  performed  implicitly,  preserving  this  feature  of  the  algorithm  requires 
implicit  information  transport  between  the  measured  and  the  predicted  curves.  Consider  the  advection 
equation  with  source  term  s  and  velocity  field  X, 

xt  +  XTXx  =  s.  (6) 

The  characteristic  curves,  c'{t )  =  X  o  c(t),  satisfy  the  ordinary  differential  equation 

^x{c{t),t)  =  s. 

Thus,  to  advect  vector  quantities  w  along  a  velocity  field  X,  solve 


fw(-,0)  =w0, 

\wt  +  Dw  ■  X  =  0, 

where  wq  denotes  the  given  vector  quantities  w  at  time  0,  i.e.,  the  initial  conditions  of  the  partial  differential 
equation  (7),  and  Dw  denotes  the  Jacobian  of  w.  The  vector  quantity  w  may  include  for  example  local 
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velocities.  To  be  able  to  interpolate  curve  positions,  which  will  subsequently  be  used  to  define  the  curve 
correction  homotopy  given  below,  distances  between  curves  need  to  be  defined.  Since  Euclidean  distances 
may  not  be  desirable  for  complicated  curve  shapes,  the  proposed  observer  framework  uses  a  distance 
measure  induced  by  a  flow  field  X  (here,  the  error  vector  field  Xerr).  Given  a  flow  field  X  and  a  particle  p 
initially  located  at  Xq,  its  travelling  distance  at  position  x  is  defined  as  the  arc-length  of  the  characteristic 
curve,  whose  tangents  are  aligned  with  X,  connecting  ato  and  x.  To  measure  traveling  distances  from  a 
complete  set  of  initial  locations,  as  specified  by  d_1(-,  0),  solve 


R.°)  =  o,  t8i 

\dt  +  ^XTXd=  1,  (8) 

where  X  ^  0  is  assumed.  Equations  (6)-(8)  are  Hamilton- Jacobi  equations.  Efficient  numerical  methods 
exist  to  compute  steady-state  solutions  of  Hamilton- Jacobi  equations  [24], 


Curve  Correction  Homotopy 


When  T  and  4-  implicitly  represent  the  curves  C  and  C  the  interpolation  can  be  accomplished  implicitly  to 
subpixel  accuracy.  In  order  to  determine  the  travelling  distance  from  each  curve  to  the  other  along  Xerr, 
compute 


dt  +  S{^)XjrrVci  =  5(4-), 
{dm)t  +  5(T-)Xj„,Wm  =  5(T-), 


d(x ,  0)  =  4-, 

dm(x,  0)  =  T, 


(9) 


where 


S(x) 


0, 

X 

Vd+x*  ’ 


if  INI  <  1, 

otherwise. 


Here,  S(x)  denotes  a  smoothed  sign  function,  which  allows  for  the  bidirectional  measurement  of  traveling 
distance,  along  Xerr  (resulting  in  positive  distance  values)  and  in  the  opposite  direction  of  Xerr  (resulting 
in  negative  distance  values).  The  results  may  be  interpreted  as  signed  distance  level  set  functions  warped 
with  respect  to  the  error  vector  field  Xerr.  The  distance  error  functions,  d  and  drn,  obtained  by  solving  the 
equation  (9)  are  interpolated  to  yield  the  interpolated  distance  function  d;, 


di  =  (1  —  a)d  +  adm ,  w  €  [0, 1], 


which  subsequently  gets  redistanced  according  to 


0)  =  di, 


arriving  at  the  corrected  distance  function  '1-,,  The  weighting  factor  a  geometrically  interpolates  the  esti¬ 
mated  and  the  measured  curve. 


Vector  Fiber  Transport 

In  order  to  compare  and  correct  the  vector  fiber  quantities,  they  need  to  be  transported  to  the  new  interpo¬ 
lated  curve,  located  somewhere  within  the  region  R.  For  the  measured  quantities  wm  and  for  the  estimated 
quantities  w  utilize  the  corresponding  transport  equations 


(pm)t  +  S(^)Xjrr\/{pm)  =  0,  Pi(x,  0)  =  Wi, 


(p)t  +  5(4-)Xj„,Vp  =  0,  p(x,  0)  =  w, 


to  propagate  the  values  throughout  the  domain. 
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Performing  the  Error  Correction 


The  error  correction  scheme  builds  on  the  framework  described  above.  We  assume  that  the  correction 
function  can  be  written  as 


(  4>c  (xerr:Ck(-),Ck:K^  \ 


The  error  correction  for  the  curve  position  is  then 

4(+)  =  $C  ( Xerr-ek{-),Ck-,K^j  , 


which  amounts  to  curve  interpolation  and  gets  computed  as 


trace(Cfc(+))  =  4(0)  \  K%  =  a  =  ak. 


State  information  needs  to  be  exchanged  and  compared  between  the  measured  and  the  estimated  curves. 
Further,  the  final  filtering  results  needs  to  be  associated  with  Ck(+).  This  is  accomplished  by  the  error 
correction  for  the  vector  fiber 

»*<+>  =  *” 

which  amounts  to  point-wise  filtering  computed  as 

(4)fc(+)  =  (Pi)fc  +  {Kij)k'W  (( Pj)k  -  ( Pj)k )  +  (Kfi*  [dm  -  (tj 


and  evaluated  at  Ck{+),  where  repeated  indices  are  summed  over  and  K'"’  is  assumed  to  be  block-diagonal 
and  decomposes  into  a  gain  matrix  for  the  fiber  quantities  (K™’w)  and  for  the  curve  position  error  (K‘"'C). 
Table  1  gives  a  description  of  the  overall  geometric  observer  algorithm. 

Finally,  we  should  note  that  the  observer  described  here  is  a  full-order  observer,  that  is,  all  states  are 
observed. 


2.1.7  Other  Results  on  Geometric  Observers 

We  have  developed  a  geometric  design  methodology  for  observing  or  filtering  dynamically  evolving  curves 
over  time.  State  measurements  are  performed  via  static  segmentations,  making  the  framework  flexible  and 
powerful,  since  any  kind  of  curve  segmentation  may  be  used  for  the  measurement  step,  including  curve 
segmentations  incorporating  shape  information.  In  this  way,  measurements  may  be  used  to  induce  shape 
information  to  the  estimated  curve  without  the  need  for  explicit  incorporation  of  shape  information  into  the 
motion  prior.  In  comparison  to  most  previous  approaches  there  is  no  finite  dimensional  motion  model.  The 
observer  framework  is  geometric, leading  to  geometrically  meaningful  observer  gains.  Further,  the  overall 
methodology  extends  to  closed  hypersurfaces  of  codimension  one  which  represent  the  boundaries  of  3D 
shapes  and  has  been  carried  out  in  the  past  year  as  part  of  our  AFOSR  work.  In  our  AFOSR  research,  we 
have  also  introduced  statistical  information  that  requires  the  notion  of  a  mean  shape  of  curves  and  curve 
covariances  as  well  as  adaptive  observer  gains  based  on  curve  statistics. 

2.2  Geometric  Particle  Filtering 

In  conjunction  with  the  geometric  observer  theory,  we  have  developed  a  geometric  particle  filter  framework 
in  [45,  46,47], 
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Geometric  observer  algorithm: 
repeat 

1)  Propagate  curve  under  the  prediction  model  (Section  2.1.4)  for  the  time- 
span 

between  two  image  measurements  (usually  given  by  the  camera  frame  rate). 

Initial  conditions  are  given  by  the  current  observer  state. 

2)  Obtain  curve  measurements  by  image  segmentation,  optical  flow,  etc.  (Sec¬ 
tion  2.1.5). 

3)  Reconcile  internal  obsen’er  state  with  the  measurements  by  error  correc¬ 
tion  (Section  2.1.6): 

a)  Establish  the  error  vector  field  (to  induce  correspondences;  Sec¬ 
tion  2.1.6). 

b)  Flow  measurements  and  observer  states  along  the  error  vector  field  (Sec¬ 
tion  2.1.6). 

c)  Perform  update  of  the  internal  observer  state  (Section  2.1.6). 
until  end  of  tracking  sequence 


Table  1 :  Description  of  the  geometric  observer  algorithm. 


The  overall  algorithm  is  based  on  three  modifications  of  the  standard  particle  filter  (PF)  [15,  3]:  (i) 
First,  we  utilize  an  importance  sampling  (IS)  density  [3]  which  can  be  considered  as  an  approximation 
to  the  optimal  IS  density  when  the  optimal  density  is  multi-modal,  (ii)  We  replace  IS  by  deterministic 
assignment  when  the  variance  of  the  IS  density  is  very  small.  Because  of  this  step,  we  are  actually  only 
sampling  on  the  6-dimensional  space  of  affine  deformations,  while  approximating  the  local  deformation  by 
the  mode  of  its  posterior.  This  is  what  makes  our  PF  algorithm  implementable  in  real  time,  (iii)  Further, 
we  proposed  a  technique  for  the  computation  of  an  approximation  to  the  mode  of  the  posterior  of  the 
local  deformation.  As  explained  in  [45,  ?,  47],  these  modifications  are  useful  to  reduce  computational 
complexity  of  any  large  dimensional  state  tracking  problem. 

2.2.1  The  Particle  Filtering  Algorithm 

This  section  describes  the  basics  of  the  proposed  method.  Fet  Ct  denote  the  contour  at  time  t  ( C*  is  repre¬ 
sented  as  the  zero  level  set  of  'Ft  (a:),  i.e.  Ct  =  {x  £  R2  :  'Ft  (at)  =  0}  [39]),  and  At  denote  a  6-dimensional 
affine  parameter  vector  with  the  first  4  parameters  representing  rotation,  skew  and  scale,  respectively,  and 
the  last  2  parameters  representing  translation.  Note  that  in  this  section  (Section  2.2)  and  in  Section  2.3 
below,  we  let  'Ft  (at)  :=  T'(at(f),  t).  Previously,  the  subscript  t  denoted  partial  derivative. 

We  employ  the  affine  parameters  (At)  and  the  contour  (Ct)  as  the  state,  i.e.,  Xt  =  [ At,Ct ]  and  treat 
the  image  at  time  t  as  the  observation,  i.e.,  Yt  =  Image (t).  Denote  by  Y\  .t  all  the  observations  until  time 
t.  Particle  filtering  [15]  allows  for  recursively  estimating  p(Xt\Yi:t),  the  posterior  distribution  of  the  state 
given  the  prior  p(Xt_i\Yi.t_i).  We  utilize  the  basic  theory  of  particle  filtering  here  as  described  in  [15]. 
The  general  idea  behind  the  proposed  algorithm  is  as  follows: 

•  Importance  Sampling:  Predict  the  affine  parameters  At  (parameters  governing  the  rigid  motion  of  the 
object)  and  perform  importance  sampling  for  Ct  to  obtain  local  deformation  in  shape,  i.e., 

•  Generate  samples  {A^\  using: 

(i) 

•  Perform  L  steps  of  curve  evolution  on  each  pf  : 

Ct(i)  =  fcE($\Yu  u[%f),  u%ef  ~  Nf(  0,  Ydef)  . 
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•  Weighting  and  Resampling:  Calculate  the  importance  weights  and  normalize  [15],  i.e., 


r,W 


p(Yt\xP)  Ptx^ix^) 


-d2(C 


<0  „(«K 


„(0 


to. 


(0 


,Yt),Ydef) 


2-/ 7  =  1 


-,(i) 


where  d2  is  any  distance  metric  between  shapes  (see  Section  2.2.7)  and  Eimage  is  any  image  based  en¬ 
ergy  functional  (see  Section  2.2.5).  Resample  to  generate  N  particles  {A>t'> ,  Cf1 }  distributed  according 
to  p(At,Ct\Yi:t).  The  resampling  step  improves  sampling  efficiency  by  eliminating  particles  with  very 
low  weights.  We  now  briefly  explain  in  detail  each  of  the  preceding  steps. 


2.2.2  The  System  and  Observation  Model 

The  problem  of  tracking  deforming  objects  can  be  separated  into  two  parts:  a)  Tracking  the  global  rigid 
motion  of  the  object;  b)  Tracking  local  deformations  in  the  shape  of  the  object,  which  can  be  defined 
as  any  departure  from  rigidity  (non-affine  deformations).  The  global  motion  (affine  transformation)  can 
be  modeled  by  the  6  parameters  of  an  affine  transformation,  At,  using  a  first  order  Markov  process.  We 
assume  that  the  local  deformation  from  one  frame  to  the  next  is  small  and  can  be  modeled  by  deformation 
in  the  shape  of  the  contour  Ct.  Thus,  the  state  vector  is  given  by  Xt  =  \Af.Ct].  The  system  dynamics  based 
on  the  above  assumption  can  be  written  as: 


At 

x 


Ct 


fpAt-x  +  ut,  ut  ~A/r(0,  IU), 


At,  1  At<2 
At,  3  At,  4 

X  + 

At,  5 

At,  6 

fdef  (/-b ;  'dt.def  , 

1 5  ^ t,def  ^ 

,  Vx  €  Ct- 1,  x  £  [if,  Ut  At(Ct- 1), 


(10) 


where  fp  models  global  rigid  motion  of  the  object  while  fdef  is  a  function  that  models  the  local  shape 
deformation  of  the  contour. 

We  further  assume  that  the  likelihood  probability,  i.e.,  probability  of  the  observation  Yt  =  Image (7) 

-g«mo?e(Ct.y t) 

given  state  Xt,  is  defined  by  p(Yt \Xt)  =  p(Yt\Ct)  oc  e  aob,  ,  where  Eimage  is  any  image  depen¬ 
dent  energy  functional  and  cr2bs  is  a  parameter  that  determines  the  shape  of  the  pdf  (probability  density 
function).  The  normalization  constant  in  the  above  definition  has  been  ignored  since  it  only  affects  the 
scale  and  not  the  shape  of  the  resulting  pdf. 

In  general,  it  is  not  easy  to  predict  the  shape  of  the  contour  at  time  t  (unless  the  shape  deformations 
are  learned  a  priori)  given  the  previous  state  of  the  contour  at  time  t  —  1,  i.e.,  it  is  not  easy  to  find  a 
good  function  fdef  that  can  model  the  shape  deformations  and  allows  one  to  sample  from  an  infinite 
(theoretically)  dimensional  space  of  curves.  Thus,  it  is  very  difficult  to  draw  samples  for  Ct  from  the 
prior  distribution.  This  problem  can  be  solved  by  doing  importance  sampling  [10],  and  is  one  of  the  main 
motivations  for  doing  curve  evolution  as  explained  in  the  following  sections.  Thus,  samples  for  At  can  be 
obtained  by  sampling  from  J\f(fpAt-\ ,  E4 )  while  samples  for  Ct  are  obtained  using  importance  sampling, 
i.e.,  we  perform  importance  sampling  only  on  part  of  the  state  space.  This  technique  of  using  importance 
sampling  allows  for  obtaining  samples  for  Ct  using  the  latest  observation  (image)  at  time  t  [59], 

The  central  idea  behind  importance  sampling  [10]  is  as  follows:  Suppose  p(x)  oc  q(x)  is  a  probability 
density  from  which  it  is  difficult  to  draw  samples  and  q(x)  is  a  density  (proposal  density  or  importance 
density)  which  is  easy  to  sample  from,  then  an  approximation  to  p(-)  is  given  by  p{x)  «  wl5( x  — 

x *),  where  wz  oc  is  the  normalized  weight  of  the  i-th  particle.  So,  if  the  samples,  xj:l\  were 

drawn  from  an  importance  density,  q(Xt\Xi-t-\,Yi:t),  and  weighted  by  wf1  oc  ,mAt  — r,  then 

wt  )s(xtl)  ~  xt)  approximates p{Xt\Y1:t). 
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In  this  proposal,  the  state  is  assumed  to  be  a  hidden  Markov  process,  i.e.. 


p(xt  |x1:t_!)  =  =  p(Yt\xt), 


and  we  further  assume  that  the  observations  are  conditionally  independent  given  the  current  state,  i.e. 
p{Yi:t\Xi:t)  =  Tlt=i  v{Yt\Xt).  Furthermore,  if  the  importance  sampling  density  is  assumed  to  depend 
only  on  the  previous  state  Xt-\  and  current  observation  Yt,  we  get  q(Xt\Xi:t_i,Yi:t)  =  q(Xt \Xt-i,  Yt). 

This  gives  the  following  recursion  for  the  weights  [10]:  =  w[l\p^Yf^X\i}p^*s  ^*~1^  .  The  impor- 

q(xt  \xt-i’Yt) 

tance  density  q(.)  and  the  prior  density  p(.)  can  now  be  written  as 

q(Xt\Xt-i,Yt)  =  p(At\At-i)  q(Ct\pt,Yt),  p(Xt\Xt-i)  =  p(At\At-\)  p(Ct\pt),  (11) 


where  q(At\At-i)  =  p(At\At-i),  since  At  is  sampled  from  p(At\At-i)  =  Af(fpAt- i,S^).  Thus,  the 
weights  can  be  calculated  from: 


w. 


(i) 


=  w. 


:<)  p(Yt\xn  p{c\ 


t- 1 


Vt  ,Yt) 


(12) 


The  probability  p(Ct\pt)  can  be  calculated  using  any  suitable  measure  of  similarity  between  shapes  (mod- 

-d2(Ct,(Xt) 

ulo  a  rigid  transformation).  One  such  measure  is  to  take  p(Ct\pt)  oc  e  ”d  ,  where  a,i  is  assumed  to 
be  very  small  such  that  it  satisfies  the  constraint  of  (10)  in  [44]  and  d2  is  any  metric  on  the  space  of  closed 
curves.  In  this  approach,  we  employ  the  distance  measure  given  in  Section  2.2.7. 


2.2.3  Approximating  the  Optimal  Importance  Density 

The  choice  of  the  importance  density  is  a  critical  design  issue  for  implementing  a  successful  particle  fil¬ 
ter.  The  proposal  distribution  q(-)  should  be  such  that  particles  generated  by  it,  lie  in  the  regions  of  high 
observation  likelihood.  One  way  of  doing  this  is  to  use  a  proposal  density  which  depends  on  the  current 
observation  [59].  The  optimal  importance  density  (one  that  minimizes  the  variance  of  the  weights  condi¬ 
tioned  on  Xt_i  and  Yt)  has  been  shown  to  be  p{Xt\Xt-i1Yt).  But  in  many  cases,  it  cannot  be  computed 
in  closed  form.  For  unimodal  posteriors,  it  can  be  approximated  by  a  Gaussian  with  mean  given  by  its 
mode,  which  is  also  equal  to  the  mode  of  p(Yt \Xt)  p(Xt\Xt_i).  In  our  case,  the  distribution  p(At\At-i) 
can  be  multi-modal,  and  hence  we  have  employed  the  following:  Sample  At  from  the  prior  state  transition 
kernel,  p(At\At-i),  and  find  the  mode  of  p(Yt\Xt)  p{Ct\pt)  to  obtain  samples  for  Ct .  Notice  that,  for 
small  deformations,  p(Yt \Xt)  p(Ct\pt )  is  indeed  unimodal  [44].  Using  (11)  and  the  likelihood  probability 
p(Yt\Xt)  defined  before,  finding  the  mode  of  p(Yt \Xt)  p{Ct\pt)  is  equivalent  to  finding  the  minimizer  of 


Etot.(Ct ,  /it,  Yt) 


Eimagei^tA  t.)  d  ipt ,  Pt  ) 


obs 


Notice  that  from  this  energy  point  of  view,  it  is  clear  why  we  can  ignore  the  partition  constants  (in  the 
definition  of  p(Yt  \ Ct)  and  p(Ct  |/tt ))  which  are  needed  to  normalize  the  various  densities  so  that  they  define 
proper  probability  measures.  Indeed,  we  are  only  interested  in  the  minimizer  of  Etot- 

Finding  the  exact  minimizer  of  Etot  for  each  particle  at  each  l  is  computationally  expensive  and  hence 
we  use  the  following  approximation:  Assuming  a  small  deformation  between  t  —  1  and  /,  both  the  terms 
in  this  summation  will  be  locally  convex  (in  the  neighborhood  of  the  minimizers  of  both  terms),  and  so 
the  minimizer  of  the  sum  will  lie  between  the  individual  minimizers  of  each  term.  Thus,  an  approximate 
solution  to  find  the  minimum  of  Etot  will  be  to  start  from  the  minimizer  of  one  term  and  go  a  certain 
distance  (i.e.,  a  certain  number  of  iterations  of  gradient  descent)  towards  the  minimizer  of  the  second.  It 
is  easy  to  see  that  C  =  /it  minimizes  the  second  term,  and  hence,  starting  with  //t  as  the  initial  guess  for 
C,  and  performing  L  iterations  of  gradient  descent  will  move  C  a  given  distance  towards  the  minimizer  of 


18 


Eimage ,  where  L  is  chosen  experimentally.  We  would  like  to  reiterate  here  that  the  optimal  choice  of  L 
will  be  one  that  finds  a  curve  C  to  minimize  Etot,  but  to  avoid  performing  the  complete  minimization  of 
Etot,  we  are  doing  this  approximation,  and  have  found  that  it  works  well  in  practice. 

Using  the  above  technique,  we  are  actually  only  sampling  on  the  6-dimensional  space  of  affine  defor¬ 
mations,  while  approximating  local  deformation  by  the  mode  of  its  posterior.  In  general,  the  “mode  tracker” 
method  described  above  reduces  the  computations  significantly. 

2.2.4  Curve  Evolution  for  Computing  Ct 

We  now  describe  how  to  obtain  samples  for  Ct  by  doing  gradient  descent  on  the  energy  functional  Eimage. 
This  operation  is  represented  by  the  function  fcE-  The  non-linear  function  fcr-:(p-.  Y,  udef)  is  evaluated 
as  follows  (for  k  =  1,2, ...,  L): 

P°  =  p,  pk  =  -  afeV ^Eimageipb-1  ,Y,udef),  fcE(p,Y,udef )  =  pL  .  (13) 

The  above  equation  is  basically  a  PDE  which  moves  an  initial  guess  of  the  contour  so  that  Eimage  is  min¬ 
imized.  Note  that  udef  ~  W(  0,  Ydef)  is  a  noise  vector  that  is  added  to  the  “velocity”  of  the  deforming 
contour  at  each  point  x  £  p  (see  [39,  25]  for  details  on  how  to  evolve  a  contour  using  level  set  represen¬ 
tation).  For  practical  examples  with  small  deformations,  Ydef  is  very  small  and  in  fact,  even  when  one 
does  not  add  any  noise  to  fcE,  there  is  no  noticeable  change  in  performance.  In  numerical  experiments, 
we  have  not  added  any  noise  to  the  curve  evolution  process.  Thus,  the  importance  sampling  density  for  At 
is  p(At\At-i)  while  that  for  Ct  is  q(Ct\pt,Yt)  =  ff(fcE(pt,  Yt),T,def  -t  0).  The  curve  Ct  thus  obtained 
incorporates  the  prediction  for  global  motion  and  local  shape  deformation. 


Alternative  Interpretation  for  /,-Iteration  Gradient  Descent 


We  perform  only  L  iterations  of  gradient  descent  since  we  do  not  want  to  evolve  the  curve  until  it  reaches 
a  minimum  of  the  energy,  Eimage ■  Evolving  to  the  local  minimizer  is  not  desirable  since  the  minimizer 
would  be  independent  of  all  starting  contours  in  its  domain  of  attraction  and  would  only  depend  on  the 
observation,  Yt.  Thus  the  state  at  time  t  would  loose  its  dependence  on  the  state  at  time  t  —  1,  and  this  may 
cause  loss  of  track  in  cases  where  the  observation  is  bad.  In  effect,  choosing  L  to  be  too  large  (taking  the 
curve  very  close  to  the  minimizer)  can  move  all  the  samples  too  close  to  the  current  observation  and  thus 
result  in  reduction  of  the  variance  of  the  samples  leading  to  “sample  degeneracy.”  At  the  same  time,  if  L  is 
chosen  to  be  too  small,  the  particles  will  not  be  moved  to  the  region  of  high  observation  likelihood  and  this 
can  lead  to  “sample  impoverishment.”  The  choice  of  L  depends  on  how  much  one  trusts  the  system  model 
versus  the  obtained  measurements.  Note  that,  L  will  of  course  also  depend  on  the  step-size  of  the  gradient 
descent  algorithm  as  well  as  the  type  of  PDE  used  in  the  curve  evolution  equation. 

Based  on  the  above  discussion,  the  importance  weights  in  (12)  can  be  calculated  as  follows: 


-EimagetC^  ,Xt)  -  d2  (C^  ) 

«)  M  p(Yt\xP)  p(Cll)\p[l))  w  e  ^  e  3 

wt  =  wt- i  - m  — -  «  wt- i - 


(c^,Yty 


( i ) 

oc  exp 


-Eb 


exp 


obs 


(14) 


where  we  have  used  the  fact  that  is  the  mean  and  Y,def  is  very  close  to  zero,  implying  that  J\T(C^ ,  Y,def  — > 
0)  can  be  approximated  by  a  constant  for  all  particles. 


2.2.5  Curve  Evolution  using  Chan-Vese  model 

Many  methods  have  been  proposed  which  incorporate  geometric  and/or  photometric  (color,  texture,  in¬ 
tensity)  information  in  order  to  segment  images  robustly  in  presence  of  noise  and  clutter.  In  our  case,  in 
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the  prediction  step  above,  fcE  can  be  any  edge-based  or  region-based  curve  evolution  equation.  In  our 
preliminary  work,  the  Mumford-Shah  functional  [34]  as  modelled  by  Chan  and  Vese  was  used  [9]  to  ob¬ 
tain  the  curve  evolution  equation.  In  Section  2.3,  we  will  describe  a  far-reaching  generalization  of  this 
segmentation  methodology  that  will  allow  us  to  track  in  much  more  complex  noisy  environments. 

For  the  Chan- Vese  model  [9],  one  applies  the  calculus  of  variations  to  minimize  the  following  energy 
Eimage- 


Eimage =  [  (/  -  Ci)2 H (^)dx  dy  +  [  (/  -  c2)2(1  -  H(^))  dx  dy  +  v  f  \VH(^)\dxdy,  (15) 
J  Cl  J  Cl  J  Cl 

where  C!,c2  and  the  Heaviside  function  // ( 'F )  are  defined  as 

=  /  I(x,y)H{^)dxdy  =  f  I(x,  y){l  -  H(^))dx  dy  ,  f  1  ^  >  0, 

1  j  H('^)dx  dy  ’  °2  f  (1  —  H(^))dx  dy  '  \0  else, 


and  finally  I(x,  y)  is  the  image  and  is  the  level  set  function.  The  energy  Eimage  can  be  minimized  by 
doing  gradient  descent  via  the  following  PDE  [9,  34]: 


<9T> 

dr 


=  5e(9) 


,div(^ 

V|V4>| ) 


-(/-Cl)2  +  (/-c2)5 


,  where  Se(s)  = 


J r(e2  +s2) 


where  r  is  the  evolution  time  parameter  and  the  contour  C  is  the  zero  level  set  of  T. 


2.2.6  Dealing  with  Multiple  Objects 

In  principle,  the  CONDENSATION  filter  [7]  could  be  used  for  tracking  multiple  objects.  The  posterior 
distribution  will  be  multi-modal  with  each  mode  corresponding  to  one  object.  However,  in  practice  it  is 
very  likely  that  a  peak  corresponding  to  the  dominant  likelihood  value  will  increasingly  dominate  over 
all  other  peaks  when  the  estimation  progresses  over  time.  In  other  words,  a  dominant  peak  is  established 
if  some  objects  obtain  larger  likelihood  values  more  frequently.  So,  if  the  posterior  is  propagated  with 
fixed  number  of  samples,  eventually,  all  samples  will  be  around  the  dominant  peak.  This  problem  becomes 
more  pronounced  in  cases  where  the  objects  being  tracked  do  not  have  similar  photometric  or  geometric 
properties.  One  may  deal  with  this  issue  using  the  method  in  [57]  by  first  finding  the  clusters  within  the 
state  density  to  construct  a  Voronoi  tessalation  [5 1  ]  and  then  resampling  within  each  Voronoi  cell  separately. 
Other  solutions  proposed  by  [54]  have  also  been  tested  for  multiple  object  tracking. 


2.2.7  Occlusions 

Prior  shape  knowledge  is  necessary  when  dealing  with  occlusions.  In  particular,  in  [66],  the  authors  incor¬ 
porate  “shape  energy”  in  the  curve  evolution  equation  to  deal  with  occlusions.  Any  such  energy  term  can 
be  used  in  the  proposed  model  to  deal  with  occlusions.  In  numerical  experiments,  we  have  dealt  with  this 
issue  in  a  slightly  different  way  by  incorporating  the  shape  information  in  the  weighting  step  instead  of  the 
curve  evolution  step,  i.e.,  we  calculate  the  likelihood  probability  for  each  particle  using  the  image  energy 
Eimage  (15)  and  a  shape  dissimilarity  measure  d2  as  follows: 


p(Yt\X^)  =  Ai 


V£f=i 


”'ibs 


+  A2 


£f=1d2(^W,^))J 


(16) 


where  Ai  +  A2  =  1,  and  d2(T'(sl,  \E>W)  is  the  dissimilarity  measure  (modulo  a  rigid  transformation)  given 

by,  d2(^W,T -W)  =  Jn(v t(A>  -  «pb))2  dx  dy,  with  /i(T)  =  f  H^]dxdy,  where  T>(s) 
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and  are  the  level  set  functions  of  a  template  shape  and  the  i- th  contour  shape,  respectively.  The 
dissimilarity  measure  gives  an  estimate  of  how  different  two  given  shapes  (in  particular,  their  corresponding 
level  sets)  may  be.  So,  higher  values  of  a !2  indicate  more  dissimilarity  in  shape.  This  strategy  may  be 
justified  by  noting  that  in  case  of  occlusion,  Eimage  will  be  higher  for  a  contour  that  encloses  the  desired 
region  compared  to  a  contour  that  excludes  the  occlusion.  Since  particle  weights  are  a  function  of  Eimage , 
the  MAP  estimate  will  be  a  particle  that  is  not  the  desired  shape.  However,  using  the  weighting  scheme 
proposed  above,  particles  which  are  closer  to  the  template  shape  are  more  likely  to  be  chosen  than  particles 
with  “occluded  shapes”  (i.e.,  shapes  which  include  the  occlusion). 

2.3  Information- Theoretic  Approach  to  Segmentation 

The  choice  of  the  segmentation  algorithm  is  very  important  to  the  success  of  our  approach  to  visual  track¬ 
ing.  We  have  considered  the  Chan-Vese  model  which  only  uses  the  first  moment  for  segmentation  for 
the  geometric  particle  filtering  just  considered.  In  our  AFOSR  work,  we  have  developed  a  far-reaching 
generalization  employing  all  of  the  statistical  moments  [33,  29], 

More  precisely,  we  address  the  problem  of  image  segmentation  by  means  of  active  contours,  whose  evo¬ 
lution  is  driven  by  the  gradient  flow  derived  from  an  energy  functional  that  is  based  on  the  Bhattacharyya 
distance.  Because  of  the  broad  statistical  nature  of  the  flow,  we  believe  that  it  may  be  very  useful  for  target 
tracking  and  has  naturally  been  incorporated  into  the  geometric  observer  framework  described  above.  In 
particular,  given  the  values  of  a  photometric  variable,  which  is  to  be  used  for  classifying  the  image  pixels, 
the  active  contours  are  designed  to  converge  to  the  shape  that  results  in  maximal  discrepancy  between  the 
empirical  distributions  of  the  photometric  variable  inside  and  outside  of  the  contours.  This  discrepancy  is 
measured  by  means  of  the  Bhattacharyya  distance  that  proves  to  be  an  extremely  useful  tool  for  solving  the 
problem  at  hand.  The  proposed  methodology  can  be  viewed  as  a  generalization  of  the  segmentation  meth¬ 
ods,  in  which  the  active  contours  maximize  the  difference  between  a  finite  number  of  empirical  moments 
of  the  “inside”  and  “outside”  distributions. 

2.3.1  Bhattacharyya  Flow 
Bhattacharyya  Distance 

In  order  to  facilitate  the  discussion,  we  just  consider  the  case  of  two  classes  (i.e.,  the  problem  of  segmenting 
an  object  of  interest  from  the  background),  followed  by  describing  the  extension  of  the  methodology  to 
multi-object  scenarios. 

In  the  two  class  case,  the  segmentation  problem  is  reduced  to  the  problem  of  partitioning  the  domain 
of  definition  12  C  R2  of  an  image  I{z)  (with  z  £  12)  into  two  mutually  exclusive  and  complementary 
subsets  12_  and  12+ .  These  subsets  can  be  represented  by  their  respective  characteristic  functions  x~  and 
X+,  which  can  in  turn  be  defined  by  means  ofalevel  set  function  \I2(z)  :  12  — >  R.  as  X-(z)  :=  H(—'S(z)), 
X+(z)  '■=  H{'i/(z))  with  z  £  12,  where  H  denotes  the  Heaviside  function. 

As  above,  given  a  level  set  function  its  zero  level  set  {z  |  ^(2)  =  0,  z  £  12}  is  used  to  implicitly 
represent  a  curve.  For  the  sake  of  concreteness,  we  associate  the  subset  12_  with  the  support  of  the  object  of 
interest,  while  12+  is  associated  with  the  support  of  corresponding  background.  In  this  case,  the  objective  of 
active  contour  based  image  segmentation  is  given  an  initialization  Tq(2),  construct  a  convergent  sequence 
of  level  set  functions  {'f,t(2)}t>0  (with  \Ft(,2:)|t_o  =  '^o(-2))  such  that  the  zero  level  set  of  T ’t{z)  coincides 
with  the  boundary  of  the  object  of  interest  for  some  T  >  0. 

We  construct  the  sequence  of  level  set  functions  via  a  gradient  flow  that  minimizes  a  properly  defined 
cost  functional.  In  our  approach,  the  latter  is  derived  in  the  following  manner.  First,  the  image  to  be 
segmented  I(z)  is  transformed  into  a  vector-valued  image  of  its  local  features  J(z).  Note  that  the  feature 
image  J(z)  ascribes  to  every  pixel  of  I  (2)  a  TV-tuple  of  associated  features,  and  hence  it  can  be  formally 
represented  as  a  map  from  12  to  Rw.  Subsequently,  given  a  level  set  function  (2),  the  following  two 
quantities  are  computed: 

p  /_  |  =  fnK~(x-J(z))X-(z)dz  fnK_(x-J(z))H(-V(z))dz 

{  '  {))  InX-(z)dz  faH(-9(z))dz  ’  (?) 
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and 


(18) 


(*!*(*)) 


fnK+(x-J(z))x+(z)  dz 
fnX+(z)  dz 


JuK+(x-3(z))H(^{z))dz 

fnff(*(z))dz 


where  x  £  RA\  and  K_  (x)  and  K+(x)  are  two  scalar-valued  functions  having  compact  or  effectively 
compact  supports.  Provided  that  the  kernels  A'_  (x)  and  K+(x)  are  normalized  to  have  unit  integrals,  the 
functions  P_(x  |  and  P+(x  |  ’I'(z))  given  by  (17)  and  (18)  are  kernel-based  estimates  of  the  proba¬ 
bility  density  functions  pdf  of  the  image  features  observed  over  the  sub-domains  fi_  and  fl+,  respectively. 

The  core  idea  of  the  our  method  is  quite  intuitive  and  it  is  based  on  the  assumption  that,  for  a  properly 
selected  subset  of  image  features,  the  “overlap”  between  the  informational  contents  of  the  object  and  of 
the  background  has  to  be  minimal.  In  other  words,  if  one  thinks  of  the  active  contour  as  a  discrimina¬ 
tor  that  separates  the  image  pixels  into  two  subsets,  then  the  optimal  contour  should  minimize  the  mutual 
information  between  these  subsets.  Note  that  for  the  case  at  hand,  minimizing  the  mutual  information  is 
equivalent  to  maximizing  the  Kullback-Leibler  divergence  between  the  pdf’s  associated  with  the  “inside” 
and  “outside”  subsets  of  pixels.  For  the  reasons  discussed  below,  however,  instead  of  the  divergence,  we 
have  developed  a  method  to  maximize  the  Bhattacharyya  distance  between  the  pdf’s.  (The  Bhattacharyya 
distance  is  defined  to  be  —  log  of  the  integral  given  in  (20)  below  which  defines  the  Bhattacharyya  coeffi¬ 
cient.)  Specifically,  the  optimal  active  contour  is  defined  as: 


T^O)  =  arg  inf  {73(T'(^))}, 

*(*) 


(19) 


where 

B(^(z))=[  s/P-{x\  3?{z))P+{x\  3/(z))dx, 

JxeRN 


with  P_  (x  |  (1/(2:))  and  P+(x  |  *1/(2))  being  given  by  the  equations  (17)  and  (18),  respectively. 


(20) 


Gradient  Flow 

In  order  to  derive  a  scheme  for  minimizing  (20),  we  need  to  compute  its  first  variation.  Accordingly,  the 
first  variation  of  _B(\I/(z))  (with  respect  to  ^(2))  is  given  by: 


=  I  f  ( dP-(x\*(z))  P+(x\*{z))  dP+{x\*(z))  P-{x\nz))\  , 

5f{z)  2  d*{z)  y  (at  1^(2))  d*(z)  ]jp+(xmz)))  ' 

(21) 


Differentiating  (17)  and  (18)  with  respect  to  \1/ (2).  one  obtains: 

dP-(x  |  ^(2))  ( P-{x  |  ^(2))  -  K_ (x  —  J (2)) 

9*w  =  {(®W)  ( - x - 

and 

dP+(x  |  (2))  ( K+(x  -  3{z))  -  P+(x  |  ^(2)) 

a*(2)  =  4<*w)  ( - X - 

where  <5(-)  is  the  delta  function,  and  and  A+  are  the  areas  of  f2_  and  f2+  given  by  fQ  X-(z)  dz  and 
fn  x+(z)  dz,  respectively. 

By  substituting  (22)  and  (23)  in  (21)  and  combining  the  corresponding  terms,  one  can  arrive  at: 


8B(*(z)) 

S'lt(z) 


6(*(z))V(z), 


(24) 
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where 


V{z)  =  -B(*(z)){AZl-A+1)+ 


(25) 


1 xeRN 


K+(x-3(z))  — 


l  /P-(»  I  *(*)).  1 
A+V  P+(x\*(z))  2. 


<xem»K-(X  3(Z))  A_V  P_(x\*(z)) 


dx. 


Assuming  the  same  kernel  K(x)  is  used  for  computing  the  last  two  terms  in  (25),  i.e.  K(x)  = 
K-(x)  =  K+(x),  the  latter  can  be  further  simplified  to  the  following  form: 


V(z)  =  l-B{^(z)){A-_l  -  A;1)  +  \  f  K(x  -  J(z))  L(x  |  *(*))  dx, 
z  z  JajeR-" 


(26) 


where 


L(x\^{z)) 


J_  jP-(x  l*(z))  1  Ip+(x\*(z)) 

A+y  P+(x\*(z))  A_\j  P_{x\^{z)Y 


(27) 


Introducing  an  artificial  time  parameter  t,  the  gradient  flow  of  d/  (z)  that  minimizes  (20)  is  given  by: 


M*)  =  -5B§^i[Z))  =  S^(z))V(z),  (28) 

where  the  subscript  t  denotes  the  corresponding  partial  derivative,  and  V (z)  is  defined  as  given  by  either 
(25)  or  (26). 

From  the  viewpoint  of  statistical  estimation,  the  cost  function  (20)  can  be  thought  of  as  accounting 
for  the  fidelity  of  estimation  of  the  optimal  level  set  function  to  observed  features  of  I(z).  However,  this 
cost  function  does  not  take  into  consideration  some  plausible  properties  of  the  optimal  solution,  and,  as 
a  result,  minimizing  (20)  alone  could  be  too  sensitive  to  measurement  noises  and/or  errors  in  the  data. 
In  order  to  alleviate  this  sensitivity,  one  can  attempt  to  filter  out  the  spectral  components  of  the  solution 
which  belong  to  the  noise  subspace.  Such  filtering  can  be  conveniently  implemented  using  the  Bayesian 
estimation  framework,  which  allows  one  to  find  the  most  likely  solution  given  both  the  observed  data  and 
a  reasonable  assumption  regarding  the  nature  of  optimal  '['(z).  One  can  also  regularize  the  solution  via 
constraining  the  length  of  the  active  contour,  in  which  case  the  optimal  level  set  dr*  (a:)  is  given  by: 


d-*^)  =  arg  inf  {B{^(z))  +  ot  [  ||  Vi7(d'(x))||  dz  \  ,  (29) 

*(*)  l  J o  J 

where  V  denotes  the  operator  of  gradient,  ||  •  ||  is  the  Euclidean  norm,  and  a  >  0  is  a  regularization 
constant,  which  controls  the  compromise  between  fidelity  and  stability.  The  gradient  flow  associated  with 
minimizing  the  cost  functional  in  (29)  can  be  shown  to  be  equal  to: 

d'tfz)  =  (^(z))  (an  -  V(z)) ,  (30) 

where  k  is  the  curvature  of  the  active  contour.  All  the  segmentation  results  reported  in  the  present  study 
have  been  obtained  via  numerically  solving  (30)  with  the  value  of  a  set  to  be  equal  to  1 . 


2.3.2  Relationship  with  Kullback-Leibler  Divergence 

There  exists  a  classical  way  to  access  similarity  between  the  informational  contents  of  two  segmentation 
classes  by  means  of  mutual  information.  In  our  case,  using  the  mutual  information  is  equivalent  to  using 
the  Kullback-Laibler  divergence,  whose  symmetrized  version  is  defined  as: 


Pi(x)  log 


P2(x) 

Pl(x) 


P2(x)  log 


Pi  0*0 

P2  0*0 


dx. 


(31) 
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with  pi(x)  and  p^{x)  being  the  pdf’s  associated  with  the  segmentation  classes.  Note  that  the  Kullback- 
Laibler  divergence  can  be  turned  into  the  Bhattacharyya  coefficient  under  the  substitution  of  the  square 
root  function  for  the  logarithm  function  in  (31): 


The  above  similarity  between  the  two  criteria  raises  the  question  of  the  advantages  and  disadvantages  of 
using  one  criterion  with  respect  to  the  other.  Obviously,  one  of  the  advantages  of  using  the  Bhattacharyya 
coefficient  is  related  to  its  analytical  form,  which  is  much  simpler  than  that  of  the  Kullback-Leibler  diver¬ 
gence.  As  a  result,  the  analytical  derivations  related  to  the  former  are  usually  much  neater,  shorter,  and 
easier  to  interpret  as  compared  to  the  latter. 

Yet  another,  much  more  crucial  difference  between  the  criteria  in  (31)  and  (32)  stems  from  the  prop¬ 
erties  of  the  functions  they  employ.  Specifically,  the  logarithm  has  an  extremely  high  sensitivity  to  the 
variations  of  its  argument  for  relatively  small  values  of  the  latter  (not  to  mention  that  it  is  undefined  at 
zero).  In  the  case  of  non-parametric  density  estimation,  the  above  property  of  the  logarithm  is  obviously 
a  disadvantage,  as  it  makes  the  Kullback-Leibler  divergence  prone  to  the  errors  caused  by  inaccuracies  in 
estimating  the  tails  of  probability  densities.  On  the  other  hand,  the  square  root  is  a  well-defined  function 
in  the  vicinity  of  zero.  Moreover,  for  relatively  small  values  of  its  argument,  the  variability  of  the  square 
root  is  considerably  smaller  than  that  of  the  logarithm.  As  a  result,  the  Bhattacharyya  coefficient  is  less 
susceptible  to  the  influence  of  inaccuracies  mentioned  above. 


2.3.3  Examples  of  Discriminative  Features 


In  this  subsection,  a  number  of  possible  definitions  of  the  feature  vector  are  discussed.  The  set  of  examples 
given  below  is  by  no  means  complete,  but  it  rather  represents  the  features  frequently  used  in  practice. 

Formally,  the  transition  from  the  data  image  I(z)  :  12  — >  R.  to  the  vector- valued  image  J(z)  :  fl  — >  Kw 
of  its  features  can  be  described  by  a  transformation  W{-}  applied  to  I(z),  i.e.,  J(z)  =  W{I(z)}.  Hence, 
the  question  of  selecting  a  useful  set  of  image  features  is  essentially  equivalent  to  the  question  of  defining 
a  proper  transformation  VV.  Perhaps,  the  simplest  choice  here  is  to  define  the  feature  image  J(z)  to  be 
identical  to  I(z),  which  corresponds  to  the  case  of  VV  being  identity  and  N  =  1.  In  this  case,  the  features 
used  for  the  classification  are  the  gray-levels  of  / ( z ),  and  the  resulting  segmentation  procedure  is  essentially 
histogram  based  [14], 

Although  the  above  choice  has  proven  useful  in  numerous  practical  settings,  it  is  definitely  not  the  best 
possible  for  situations  when  both  object  and  background  have  similar  intensity  patterns.  In  this  case,  it 
seems  reasonable  to  take  advantage  of  a  relative  displacement  of  these  patterns  with  respect  to  each  other 
via  transforming  I(z)  to  the  space  of  its  partial  derivatives.  This  transformation  is  performed  by  setting 
W  =  V,  in  which  case  the  feature  space  becomes  two-dimensional,  i.e.,  N  =  2. 

As  a  next  logical  step,  one  can  smooth  the  above  gradient  V/(z)  using  a  set  of  low-pass  filters  with 
progressively  decreasing  bandwidths.  This  construction  brings  us  directly  to  the  possibility  to  define  W  to 
be  a  wavelet  transform  [32],  Note  that,  in  this  case,  each  pixel  of  the  resulting  J(z)  carries  information  on 
multiresolution  (partial)  derivatives  of  I(z).  It  should  be  noted  that  using  the  wavelet  coefficients  as  dis¬ 
criminative  features  for  image  segmentation  has  long  been  successfully  used  in  numerous  applications  [58]. 

The  dependency  structure  between  the  partial  derivatives  of  l  (z)  can  be  captured  by  the  structural 
tensor  defined  as: 


T(z) 


(' dZlI(z ))2  dZlI(z)dZ2I(z) 
dzJ{z)dzJ{z)  (dZ2I(z))2 


(33) 


with  z  =  (z\ ,  zf),  and  dZl  and  dZ2  denoting  the  corresponding  partial  derivatives.  In  this  case,  the  feature 
space  is  three  dimensional,  as  for  each  z,  J  (z)  is  defined  to  be  equal  to 


[( dZlI(z ))2,  dZlI(z)dZ2I(z),  {dZ2I(z))2  . 
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Note,  however,  that  this  choice  of  the  feature  space  is  not  linear  since  the  structural  tensor  (33)  is  positive 
definite,  and  therefore  the  pixels  of  J  (z)  lie  on  a  non-linear  manifold.  Fortunately,  the  availability  of  kernel- 
based  methods  for  estimating  probability  densities  defined  over  non-linear  manifolds  makes  it  possible  to 
apply  our  approach  in  this  situation  as  well. 

Another  interesting  choice  of  J(z)  can  be  followed  in  the  scenarios,  in  which  I (z)  appears  as  an  element 
of  a  sequence  of  tracking  images.  In  this  case,  J(z)  :  fi  — >■  R2  can  be  defined  to  be  the  vector  field  of  local 
displacements  of  the  gray-levels  of  I(z).  Specifically,  let  dt,I(z)  denotes  the  temporal  (partial)  derivative 
of  I(z).  Let  also  G(z)  be  a  two  component,  column  vector  with  its  first  and  second  components  equal  to 
dZlI{z)  dtI(x)  and  dZ2I(z)  dtI{x),  respectively.  Then,  the  gray-level  constancy  constraint  can  be  shown 
to  result  in  the  following  least  square  solution  for  the  local  displacement  field  v(z): 

v(z)  =  —  T_1(,z)  G(z),  Vz  €  fl,  (34) 

with  T(z)  being  the  structure  tensor  given  by  (33).  Consequently,  in  the  case  when  the  motion  of  the 
tracked  object  is  independent  of  that  of  its  background,  the  feature  image  can  be  defined  by  setting  J (z)  = 
v(z).  We  note  that  this  choice  seems  to  be  reasonable  for  many  tracking  scenarios,  where  the  background 
motion  is  either  negligible  or  associated  with  the  ego-motion  of  camera,  which  rarely  correlates  with  the 
dynamics  of  tracked  objects. 

The  local  moments  of  I(z)  [56],  multiresolution  versions  thereof,  and  the  local  fractal  dimension  are 
among  many  other  image  features,  which  could  be  used  for  segmentation.  Note  that  a  combined  use 
of  all  the  features  mentioned  above  is  also  possible.  Thus,  for  example,  by  letting  the  feature  vector 
x  =  (x\,  X2,  xf)  be  composed  of  the  intensity  Xi  and  local  velocity  components  X2,  £3 ,  one  can  perform 
segmentation  based  on  both  gray-scale  and  motion  information. 

2.3.4  Kernel  Based  Estimation 

The  conditional  densities  p{x  \  x  £  f2_)  and  p(x  \  x  £  fl_)  may  be  estimated  using  the  kernel  estimation 
method.  In  this  subsection,  we  briefly  consider  the  problem  of  defining  the  kernel’s  bandwidth,  which 
should  be  consistent  with  the  data  size  for  the  estimates  to  be  reliable. 

In  the  discrete  setting,  kernel  density  estimation  amounts  to  approximating  the  (unknown)  pdf  p{x)  of 
an  TV-dimensional  random  vector  x  according  to: 

1  " 

f{x)  =  ~y^,K(r{x-xl),  (35) 

i=l 

where  {aV}”=1  are  n  independent  realizations  of  x.  In  (35),  the  kernel  Kcr(x)  is  parameterized  by  a  vector 
cr  and  has  a  unit  integral,  i.e.  fRN  Kcr(x)  dx  =  1. 

A  number  of  possible  definitions  of  the  kernel  Kcr(x )  are  possible,  among  which  the  most  popular  to 
take  K<j(x)  to  be  a  Gaussian  pdf,  and  this  is  the  choice  followed  in  this  proposal.  In  order  to  facilitate  the 
numerical  implementation  of  the  density  estimation,  we  use  a  separable  (isotropic)  form  of  the  Gaussian 
pdf,  in  which  case  Kcr(x)  is  defined  as: 


N 

K(r(x )  =  (27 t)-n/2  cr"1  exp  {-xl/2al},  (36) 

fc= 1 

where  X\,X2,  ■  ■  ■  ,Xn  are  the  coordinates  of  x,  and  the  standard  deviations  cr  =  (cr/,.}()(=1  control  the 
extension  of  Ka(x)  along  each  of  the  independent  directions.  It  should  be  noted  that  the  separability  of 
the  kernel  in  (36)  does  not  imply  the  separability  of  the  estimate  f(x)  that  is  now  given  by: 

1  n 

fix)  =  -  V  Kai  (x  -  x\)  I<a2  (x  —  xlf]  ■  ■  ■  Kgn  (x  -  xjy),  (37) 

n  ' 
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with  ICak  =  (\j2ita2k)  1  exp  {-a;£/2o|}  for  k  €  {1, 2, . . . ,  N}. 

The  kernel  method  of  density  estimation  has  proven  valuable  in  numerous  applications.  It  is  well 
known,  however,  that  effective  use  of  this  method  requires  proper  choice  of  the  bandwidth  parameters  crk. 
When  insufficient  smoothing  is  done  (i.e.,  the  bandwidth  parameters  are  too  small),  the  resulting  density 
estimate  is  too  rough  and  contains  spurious  features  that  are  artifacts  of  the  sampling  process.  On  the 
other  hand,  when  excessive  smoothing  is  done,  important  features  of  the  underlying  structure  are  smoothed 
away.  Consequently,  to  optimize  the  accuracy  of  the  estimates  in  (17)  and  (18),  the  bandwidth  parameters 
{ak)k- 1  should  be  properly  defined.  This  will  be  a  topic  of  investigation  in  our  research,  and  in  particular 
we  are  considering  an  expectation  maximization  based  approach  for  this  purpose. 


2.3.5  Multi-Object  Case 

In  order  to  extend  the  applicability  of  the  proposed  method  to  the  case  when  images  are  composed  of  more 
than  two  segmentation  classes,  a  multi-class  version  will  now  be  described.  In  particular,  in  such  a  scenario, 
the  image  domain  is  considered  to  be  a  union  of  M  (mutually  exclusive)  subdomains  {Ofc  }k[^  t ,  each  of 
which  is  associated  with  a  corresponding  (conditional)  pdf  pk{x).  Consequently,  the  first  step  to  be  done 
is  to  generalize  the  definition  of  the  Bhattacharyya  coefficient  to  the  case  of  M  densities.  Such  a  natural 
generalization  is  known  as  the  average  Bhattacharyya  coefficient  as  given  by  [22]: 


The  above  coefficient  represents  a  cumulative  measure  of  discrepancy  between  all  the  possible  pairs  of  the 
densities  under  consideration.  Note  that  the  normalization  constant  2 /(M(M  —  1))  guarantees  that  the 
coefficient  (38)  takes  its  values  in  the  interval  between  0  and  1. 

The  additivity  of  the  construction  in  (38)  makes  it  trivial  to  derive  the  corresponding  gradient  flow. 
In  fact,  each  additive  term  of  the  cost  functional  can  be  differentiated  independently  so  that  the  resulting 
gradient  flow  is  given  as  a  sum  of  the  gradient  flows  related  to  the  additive  components  in  (38).  In  order  to 
complete  the  multi-class  formulation  of  the  Bhattacharyya  flow,  we  will  need  to  briefly  describe  how  one 
would  use  level  sets  in  this  context. 

Obviously,  in  the  multi-class  case,  using  only  one  level  set  function  would  be  insufficient  to  solve  the 
problem  at  hand.  Consequently,  we  follow  the  multiphase  segmentation  formulation  of  [63]  that  uses  p 
level  set  functions  which  are  capable  of  segmenting  the  image  I(z)  into  (up  to)  2V  regions. 

In  particular,  since  one  level  set  function  partitions  the  image  domain  fl  into  two  sub-domains,  p  level 
set  functions  can  partition  into  2P  sub-domains,  each  of  which  is  labeled  by  the  signs  of  the  level  set 
functions  {'I,i(z)}f_1  in  that  sub-domain.  The  formulae  for  the  multiphase  gradient  flow  corresponding  to 
(38)  can  be  easily  obtained  by  plugging  the  results  of  Section  2.3.1  into  the  “templates”  derived  in  [63], 
mutatis  mutandis. 


2.3.6  Information-Theoretic  Segmentation  and  Blind  Source  Separation 

We  have  pursued  a  number  of  directions  for  the  use  of  segmentation  in  visual  tracking.  First  of  all,  there 
is  a  similarity  between  the  problem  of  image  segmentation  by  means  of  active  contours  and  the  problem 
of  blind  source  separation  as  a  specific  instance  of  independent  component  analysis  (ICA)  [20].  The  latter 
is  a  reconstruction  problem,  in  which  a  number  of  unknown  source  signals  have  to  be  recovered  from 
measurements  of  their  algebraic  mixtures.  This  problem  has  inspired  the  proposal  of  numerous  solutions, 
many  of  which  are  based  on  finding  the  directions  -  independent  components  -  in  a  multi -dimensional  space, 
along  which  the  mixtures  (or  the  projections  thereof)  are  as  independent  as  possible.  In  this  connection, 
a  number  of  information-based  criteria,  such  as  the  Kullback-Leibler  divergence,  have  been  employed. 
Returning  to  the  problem  of  segmentation,  one  can  think  of  a  data  image  as  a  geometric  mixture  of  sources, 
i.e.,  of  segmentation  classes.  In  such  a  case,  the  active  contour  acts  in  a  certain  sense  as  an  independent 
component  when  trying  to  separate  the  image  into  as  many  independent  regions  as  possible.  It  is  quite 
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interesting  to  note  that  the  method  proposed  here  employs  the  concept  and  tools,  which  are  very  much 
similar  to  those  used  in  ICA.  This  has  been  a  nice  research  direction  in  the  just  completed  AFOSR  program. 
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